Model for English-Urdu Statistical Machine Translation
نویسندگان
چکیده
There are above 60 million first language speakers of Urdu and above 104 million second language speakers. Lot of knowledge on the internet available/useful to these speakers of Urdu is in English. The contrast in typology of both languages is interesting to study for Statistical Machine Translation. However, there is almost no parallel aligned data available freely for the selected language pair (English-Urdu). In this paper we discuss the issues of corpus alignment and share the results of baseline system prepared using Moses Decoder and other supporting tools.
منابع مشابه
Word-Order Issues in English-to-Urdu Statistical Machine Translation
We investigate phrase-based statistical machine translation between English and Urdu, two Indo-European languages that differ significantly in their word-order preferences. Reordering of words and phrases is thus a necessary part of the translation process. While local reordering is modeled nicely by phrase-based systems, long-distance reordering is known to be a hard problem. We perform experi...
متن کاملDevelopment of Parallel Corpus and English to Urdu Statistical Machine Translation
In this paper we share the efforts for development of a parallel corpus for statistical machine translation for English text into Urdu. There are certain issues faced during this effort, which are shared and discussed.
متن کاملCreation of comparable corpora for English-Urdu, Arabic, Persian
Statistical Machine Translation (SMT) relies on the availability of rich parallel corpora. However, in the case of under-resourced languages or some specific domains, parallel corpora are not readily available. This leads to under-performing machine translation systems in those sparse data settings. To overcome the low availability of parallel resources the machine translation community has rec...
متن کاملUrdu and Hindi: Translation and sharing of linguistic resources
Hindi and Urdu share a common phonology, morphology and grammar but are written in different scripts. In addition, the vocabularies have also diverged significantly especially in the written form. In this paper we show that we can get reasonable quality translations (we estimated the Translation Error rate at 18%) between the two languages even in absence of a parallel corpus. Linguistic resour...
متن کاملAGHAZ: An Expert System Based approach for the Translation of English to Urdu
–Machine Translation (MT ) of English text to its Urdu equivalent is a difficult challenge. Lot of attempts has been made, but a few limited solutions are provided till now. We present a direct approach, using an expert system to translate English text into its equivalent Urdu, using The Unicode Standard, Version 4.0 (ISBN 0-321-18578-1) Range: 0600–06FF. The expert system works with a knowledg...
متن کامل